Skip to content

feat(reader): lazy DateTimeParts reassembly#49

Merged
dfa1 merged 1 commit into
mainfrom
feat/datetimeparts-lazy
Jun 16, 2026
Merged

feat(reader): lazy DateTimeParts reassembly#49
dfa1 merged 1 commit into
mainfrom
feat/datetimeparts-lazy

Conversation

@dfa1

@dfa1 dfa1 commented Jun 16, 2026

Copy link
Copy Markdown
Owner

Summary

Convert vortex.datetimeparts decoding from a non-functional GenericArray wrapper into a working lazy LazyDateTimePartsLongArray.

Background

The pre-existing decoder returned a GenericArray holding (days, seconds, subseconds) children but no consumer in the extension-decode path (ExtensionStorage, TimestampExtensionDecoder, DateExtensionDecoder) knew how to reassemble that shape back into the epoch count their accessors expect. The path was effectively dead at scan time — the encoder tests round-tripped the children individually but never reconstructed an epoch.

Changes

  • LazyDateTimePartsLongArray (record, implements LongArray) holds the three children plus precomputed unitsPerDay / unitsPerSecond multipliers. getLong(i) = days[i] * unitsPerDay + seconds[i] * unitsPerSecond + subseconds[i]. forEachLong / fold use the same per-row path.
  • DateTimePartsArrays (package-private) centralises the per-row signed-long read so each child can use whichever ptype the encoder picked (Byte/Short/Int/Long, optionally wrapped in MaskedArray).
  • DateTimePartsEncodingDecoder parses the parent Extension's TimeUnit metadata byte, computes unitsPerSecond = TimeUnit.divisor() (Days unit falls back to 1; seconds/subseconds children are zero in that case) and unitsPerDay = 86_400 × unitsPerSecond, then constructs the lazy record.
  • DateTimePartsEncodingEncoderTest asserts the reassembled epoch values instead of the (now hidden) per-child breakdown — testing the behaviour the encoder actually guarantees.

Pattern

Same top-level record shape as the rest of the lazy-decode session. The lazy record's dtype() is the parent Extension dtype so it slots transparently into the existing ExtensionStorage.epochIntegerTimestampExtensionDecoder.instant(...) pipeline (which already pattern-matches the LongArray case).

Test plan

  • ./mvnw verify — 13 modules SUCCESS, integration suite 40s green.
  • 3 new unit tests in LazyDateTimePartsLongArrayTest pass.
  • DateTimePartsEncodingEncoderTest round-trip pass (asserting reassembled epoch).

🤖 Generated with Claude Code

…Array

The pre-existing DateTimePartsEncodingDecoder returned a generic
GenericArray wrapping the three children (days, seconds, subseconds)
but no consumer in the extension-decode path (ExtensionStorage,
TimestampExtensionDecoder, DateExtensionDecoder) knew how to reassemble
that shape back into the epoch count their accessors expect. The path
was effectively dead at scan time — the encoder tests round-tripped
the children individually but never reconstructed an epoch.

Add LazyDateTimePartsLongArray (record, implements LongArray) that holds
the three children plus the precomputed unitsPerDay / unitsPerSecond
multipliers and reassembles on demand:

    getLong(i) = days[i] * unitsPerDay
               + seconds[i] * unitsPerSecond
               + subseconds[i]

DateTimePartsArrays (package-private) centralises the per-row read so
each child can use whichever signed-integer ptype the encoder picked
(Byte / Short / Int / Long Array, optionally wrapped in MaskedArray).

DateTimePartsEncodingDecoder parses the parent Extension dtype's
TimeUnit metadata byte, computes unitsPerSecond = TimeUnit.divisor()
(falling back to 1 for the Days unit, whose seconds and subseconds
children are zero) and unitsPerDay = 86_400 × unitsPerSecond, then
constructs the lazy record. No buffer allocation, no per-row copy.

Now the extension-decode pipeline composes correctly: scanning a
vortex.datetimeparts-encoded column under a vortex.timestamp extension
produces a LongArray of reassembled epoch counts, which feeds into
TimestampExtensionDecoder.instant exactly like a Materialized child.

Updated DateTimePartsEncodingEncoderTest to assert the reassembled
epoch value instead of the (now hidden) per-child structure — the
behaviour the encoder is actually guaranteeing.

3 new unit tests in LazyDateTimePartsLongArrayTest cover the
millisecond reassembly, widening from narrower child ptypes, and the
fold reduction. ./mvnw verify green (13 modules, integration suite 40s).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@dfa1 dfa1 merged commit 8ab9ec7 into main Jun 16, 2026
6 checks passed
@dfa1 dfa1 deleted the feat/datetimeparts-lazy branch June 16, 2026 11:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant